Best 

Available 

Copy 


AFOSR  69- 0405TK 


CONFIDENCE  TESTING  AT  THE  OFFICER  TRAINING  SCHOOL, 
LACKLAND  AIR  FORCE  BASE:  SEPTEMBER  1968 

Emir  H.  Shuford,  Jr.  and  H.  Edward  Kassengill 

Sponsored  by 

Advanced  Research  Projects  Agency 
ARPA  Order  No.  833 


Jhls  document 
*elMM  and  sale ; 


°een  aPPPoved  for  pubUo 

its  distribution  is  unlimUt 


CONFIDENCE  TESTING  AT  TIE  OFFICER  TRAINING  SCHOOL,  LACKLAND  AIR  FORCE  BASE: 

SEPTEMBER  1968 


Emir  U.  Shu  ford,  Jr.  end  E.  Eduard  Massengill 


BACKGROUND 

On  the  morning  of  13  September  1968,  the  Air  Training  Coaaand  of  the  united 
States  Air  Force  conducted  a  preliminary  r-yout  of  Valid  Confidence  testing 
by  using  the  aethod  to  administer  an  achievement  test  to  98  officer  candidates 
of  the  Officer  Training  School  (OIS),  Lackland  Air  Force  Base,  Texas.  The  ex- 
periaent  was  observed  by  Major  Donald  W.  Jones  and  Captain  J.  R.  Schville  of  Head¬ 
quarters,  Air  Training  Cowand,  while  Dr.  Shu  ford  and  Mr.  Massengill  of  The 
Shuford-Massengill  Corporation  assisted  Mr.  Anthony  P.  Barra,  Director  of  Test¬ 
ing  at  CIS,  in  the  adninistration  of  a  unit  test  concerned  with  the  leadership 
curriculua  at  GTS.  This  test,  with  alternate  forns  designated  Ll-2  and  L1-2A, 
was  a  revised  version  of  the  test  previously  taken  by  these  saae  officer  candi¬ 
dates  as  part  of  tne  normal  course  of  instruction  and  evaluation  at  OTS. 

The  original  intent  had  been  to  readminister  a  test  that  had  been  taken  the  day 
before  by  these  officer  candidates.  It  developed  that  the  officer  candidates 
had  been  briefed  cn  the  results  of  the  test  and  by  that  morning  had  almost  com¬ 
plete  confidence  in  the  correct  answers  to  all  the  questions.  Since  administer¬ 
ing  a  test  under  these  conditions  would  have  not  required  students  to  demonstrate 
that  they  could  discriminate  according  to  the  quality  of  information  available  to 
then,  it  was  decided  to  readminister  the  leadership  test  which  had  been  given 
some  weeks  earlier.  By  this  time,  of  course,  it  was  to  be  expected  that  the  stu¬ 
dents  had  forgotten  some  of  the  correct  answers  and,  in  some  instances,  would  not 
be  justified  in  having  co^ilete  confidence  in  an  answer. 


PROCEDURE 

A  tape  recording  was  used  to  instruct  the  students  on  how  to  take  a  Valid  Con¬ 
fidence  test.  After  playing  the  tape,  Mr.  Massengill  answered  some  questions 
from  the  students  and  the  students  took  a  short  practice  test  using  the  SCoRule 
and  one  of  the  answer  sheets.  Instructions  and  a  short  break  took  approximately 
one  hour.  The  students  then  returned  to  the  auditorium  and  responded  to  the  58 
four-alternative  multiple-choice  test  items  using  the  SCoRule  and  three  addition¬ 
al  answer  sheets.  The  students  were  allowed  one  and  one-half  hours  to  complete 
the  test.  This  is  the  time  normally  allowed  at  OTS  for  the  administration  of  a 
unit  test. 

On  completing  the  test,  the  student  wrote  the  time  on  his  answer  sheet  and  pass¬ 
ed  in  the  test  booklet,  his  answer  sheets,  and  the  SCoRule.  The  distribution  of 
finishing  times  is  shown  in  Figure  1.  It  is  apparent  that  about  4/5  of  the  stu¬ 
dents  finished  well  ahead  of  the  time  limit  and  about  1/5  required  the  complete 
period  to  complete  the  test. 

The  distribution  of  finishing  times  for  the  students  taking  the  test  as  a  choice 
test  is  not  available  to  us,  but  even  without  this  comparative  data  some  con¬ 
clusions  can  be  reached.  Even  though  it  must  be  true  that  it  does  take  longer 


to  write  down  a  decree  of  confidence  for  each  of  the  answers  than  to  choose  among 
the  answers  and  to  indicate  this  choice,  and  even  though  confidence  testing  tends 
to  make  students  think  more  carefully  about  test  questions,  the  data  shown  in 
Figure  1  indicates  it  is  quite  feasible  to  give  a  Valid  Confidence  test  within 
the  tine  limits  usually  allowed  for  choice  tests.  This  is  so  because  most  of  the 
students  finished  early  and  had  tine  left  over  before  the  tine  limit  was  readied. 
In  shifting  from  choice  to  confidence  testing,  these  students  would  take  somewhat 
longer  to  complete  the  test  and  as  a  consequence  would  have  less  time  left  over 
before  exceeding  the  time  limit  while  the  1/5  of  the  class  which  used  the  full 
time  to  finish  the  test  night  act  the  same  no  matter  what  method  of  testing  is 
used,  or  amount  of  time  provided.  Notice  that  the  distribution  of  finishing 
times  clusters  around  SO  to  60  minutes  and  tails  off  to  shorter  and  longer  times. 
The  students  taking  the  full  time  to  finish  the  test  seem  not  to  be  a  part  of 
this  distribution,  that  is,  these  students  do  not  represent  a  truncation  at  the 
tail  of  the  distribution  as  evidenced  by  the  fact  that  there  is  a  gap  of  15 
rimites  between  the  tail  of  the  main  distribution  and  the  time  limits  of  the 
test. 

Thus,  if  these  students  really  understood  the  instructions  and  realized  what  con¬ 
fidence  testing  is  all  about,  then  we  can  say  that,  with  a  one  time  investment  of 
one  hour  instructing  students  on  how  to  take  a  test  this  way,  it  is  possible  to 
administer  a  test  as  a  Valid  Confidence  test  allowing  no  more  time  for  this 
administration  than  was  previously  allowed  for  administration  of  the  test  as  a 
choice  test. 


DID  THE  STUDENTS  UNDERSTAND  THE  INSTRUCTIONS? 

The  test  data  was  analyzed  by  The  Shufbrd-Massengill  Corporation  using  test  keys 
provided  by  the  Officer  Training  School,  Lackland  Air  Force  Base.  As  described 
in  Shuford  S,  Massengill  (1968)  and  Shuford  (1969)  there  is  a  basic  test  for  the 
meaning  and  validity  of  confidence  which  can  serve  to  indicate  whether  or  not 
the  students  understood  the  test  instructions  for  the  Valid  Confidence  testing 
procedures.  The  short  form  of  this  validity  test  is  tc  see  if  the  percent  "Z" 
answers  correct  characteristic  of  the  student  is  indeed  greater  than  the  ex¬ 
pected  percent  correct  answers  inferred  for  his  taking  the  test  as  a  choice  test. 
These  two  statistics  can  be  computed  for  each  student  by  counting  the  total 
number  of  times  the  student  placed  complete  confidence  in  an  answer  by  assign¬ 
ing  a  "Z"  to  it  and  then  finding  the  percent  of  times  that  these  answers  were 
actually  correct  answers  in  order  to  obtain  the  percent  "Z"  answers  correct. 

The  inferred  expected  correct  answers  may  be  found  by  using  his  confidence  re¬ 
sponses  to  infer  what  choice  the  student  would  have  made  if  the  test  had  been 
administered  as  a  choice  test.  (The  underlying  assumption  here  is  that  the 
student  would  have  chosen  that  answer  in  which  he  had  the  greatest  amount  of 
confidence.)  This  is  easily  determined  except  for  those  instances  in  which  two 
or  more  answers  are  tied  for  highest  confidence.  In  these  cases,  the  student 
is  given  an  expected  item  choice  score.  For  example,  when  the  correct  and  one 
other  answer  is  tied  for  the  highest  confidence,  then  the  student  is  given  1/2 
of  a  point  while  if  the  correct  answer  and  two  other  answers  are  tied  for  high¬ 
est  confidence,  the  student  is  given  1/3  of  a  point,  and  so  on.  These  inferred 
item  choice  scores  of  either  one  full  point,  1/2  point,  1/3  point,  1/4  point, 
or  zero  points  are  then  summed  to  obtain  a  raw  choice  score  and  then  divided  by 
the  total  number  of  items  (in  this  case,  58)  to  obtain  the  inferred  expected 
correct  answers  as  plotted  in  Figure  2. 

Notice  that  the  data  for  each  of  the  98  students  passes  this  test,  as  indicat¬ 
ed  by  all  points  falling  above  the  diagonal  line,  i.e.,  the  confidence  responses 
yield  more  information  than  do  the  choices.  Notice  also  that  there  are  truly 
great  individual  differences  in  this  measure  ranging  from  the  one  student  who 
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■ade  the  top  choice  test  score  but  whose  confidence  test  data  yielded  very  little 
■ore  information  than  did  his  choice  test  data  up  to  the  13  students  who  evidenced 
perfection  in  their  use  of  the  "Z"  response  in  Valid  Confidence  testing.  In  s un¬ 
wary ,  Figure  2  indicates  that  the  instructions  were  learned  well  by  at  least  a 
great  majority  of  the  students. 


EXTERNAL  VALIDITY  ANALYSIS 

To  further  understand  the  implications  of  this  validity  test,  it  is  necessary 
to  examine  in  detail  the  test  data  for  sene  of  these  students,  particularly  those 
at  the  extremes.  The  five  students  whose  data  points  are  circled  in  Figure  2 
and  who  fall  nearer  the  diagonal  line  are  students  whose  confidence  data  is  not 
telling  us  eud i  nore  than  would  their  choice  test  data,  while  the  five  students 
whose  choice  data  points  are  circled  and  appear  at  the  100%  level  up  near  the 
top  level  of  Figure  2  are  students  whose  confidence  test  data  are  telling  us 
ouch  nore  than  would  their  choice  test  data.  This  gives  two  extrene  groups  of 
students. 

We  can  do  a  full  analysis  of  the  test  data  of  these  students  by  finding  a  per¬ 
cent  correct  for  each  possible  assignment  of  degree  of  confidence  and  plotting 
this  as  shown  in  Figures  3  and  4.  For  each  graph  the  data  has  been  averaged 
for  the  group  of  five  students.  The  dashed  line  has  been  derived  fron  the  in¬ 
ferred  choice  test  data  and  indicates  what  would  happen  if  these  students  had 
given  no  nore  information  in  their  confidence  responses  than  in  their  choice 
responses  while  the  diagonal  line  represents  perfection. 

Figure  3  shows  the  average  behavior  for  the  five  students  whose  confidence  re¬ 
sponses  yielded  nininal  gain  in  information  over  choice  testing.  The  empirical 
function  (represented  by  the  data  points  connected  by  the  bold  straight  line) 
does  indeed  have  a  steeper  slope  than  does  the  dashed  line  thus,  indicating 
that  even  the  students  at  this  extrene  are  giving  nore  information  with  their 
confidence  responses.  Notice  that  the  data  points  fall  fairly  close  to  the 
diagonal  line  except  for  extremely  high  and  extremely  low  degrees  of  confidence. 
Therefore,  over  the  middle  of  the  range  of  confidence  these  students  fairly 
realistically  evaluate  the  quality  of  the  information  available  to  them.  They 
neither  overvalue  nor  undervalue  the  confidence  jsutified  by  the  information  at 
hand.  When,  however,  the  situation  is  such  that  a  fairly  high  degree  of  con¬ 
fidence  is  justified,  these  students  tend  to  "go  all  the  way"  and  place  100% 
confidence  on  the  answer  rather  than  the  80%  or  90%  which  is  probably  justified. 
Likewise,  at  the  other  extreme,  when  the  information  is  such  that  a  student  can 
almost  exclude  the  answer  as  a  logical  possibility  the  students  again  "go  all 
the  way"  and  put  0%  confidence  on  the  answer  rather  than  the  5%  or  10%  which 
probably  is  justified. 

Now  look  at  Figure  4  which  shows  the  average  data  for  five  of  the  students 
whose  confidence  responses  yield  maximal  gain  in  information  over  choice  test¬ 
ing.  The  empirical  line  varies  both  above  and  below  the  diagonal  line  due  un¬ 
doubtedly  to  the  random  fluctuation  resulting  from  the  small  sample  sizes 
yielding  the  data  points.  A  theoretical  function  fitted  to  these  data  points 
would  undoubtedly  be  a  straight  line  with  a  slope  very  close  to  one  and  almost 
identical  to  the  diagonal  line  representing  perfection  in  evaluating  the  quality 
of  information.  The  data  of  these  five  students  represents  exceptional  realism 
and  deviates  trivially  from  ideal  performance.  One  wonders  if  the  exceptional 
ability  of  these  students  in  evaluating  information  also  manifests  itself  in 
content  areas  other  than  leadership  and  in  forms  of  behavior  other  than  test¬ 
taking. 
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REALISM  IN  EVALUATING  THE  QUALITY  OF  INFORMATION 


A  student's  skill  at  evaluating  the  quality  of  information  as  reflected  in  this 
type  of  analysis  is  a  totally  new  ability  measure  available  only  from  confidence 
testing.  To  the  extent  that  this  skill  proves  to  be  stable  and  characteristic 
of  the  individual  over  several  domains  of  behavior  its  existence  as  an  ability 
would  be  confirmed.  This  in  turn  might  have  far  reaching  implications.  For 
example,  this  ability  is  different  from  how  much  information  or  knowledge  an 
individual  possesses.  He  can  possess  a  great  deal  of  information  but  be  very 
poor  at  evaluating  its  quality.  Or  he  can  possess  very  little  information  but 
be  quite  expert  at  evaluating  the  quality  of  this  information  and,  of  course, 
vice  versa.  No  matter  how  much  information  an  individual  possesses,  however, 
his  ability  to  use  this  information  effectively  to  make  decisions  of  high 
quality  remains  limited  by  his  ability  to  evaluate  the  quality  of  the  informa¬ 
tion. 

To  see  this  most  clearly,  suppose  that  we  were  having  officers  evaluate  the 
quality  of  certain  intelligence  information  and  that  this  evaluation  was  done 
in  terms  of  degree  of  confidence  as  to  the  existence  of  certain  situations. 

These  degrees  of  confidence  are  then  fed  as  probabilities  into  an  information 
system  (possibly  computer  based)  which  combines  these  probabilities  with  the 
utilities  of  the  possible  outcomes  according  to  the  rules  of  mathematical  de¬ 
cision  theory  in  order  to  recommend  a  course  of  action.  This  decision  system 
applies  logic  and  mathematics  to  the  data  at  hand  to  make  the  best  possible  de¬ 
cisions.  The  effectiveness  of  these  decisions,  therefore,  would  only  be  limited 
by  the  data  that  the  system  receives  in  terms  of  the  degrees  of  confidence.  The 
value  of  this  data  depends  not  only  on  the  information  available  to  the  officer 
but  on  the  realism  with  which  he  evaluates  this  information.  This  latter  factor 
is  exactly  the  measure  that  we  are  dealing  with  in  Valid  Confidence  testing. 

Consider  two  officers  having  exactly  the  same  information  making  inputs  to  such 
a  system.  Suppose  one  officer  is  nearly  perfect  in  evaluating  the  quality  of 
information  while  the  other  officer  cannot  tell  good  from  bad  information.  He 
doesn't  know  what  he  knows  and  he  doesn’t  know  what  he  doesn't  know.  Clearly 
the  decision  system  would  perform  much  more  effectively  with  inputs  from  the 
first  officer  who  was  able  to  give  realistic  values  to  the  information. 

We  don't  need  to  have  a  computer-based  decision  system  to  make  this  argument 
valid.  The  officer  could  veTy  well  be  making  his  own  decisions  and  the  com¬ 
plete  system  would  be  internal  to  the  officer.  It  doesn't  really  matter.  The 
officer  can  still  behave  in  accord  with  the  logic  and  mathematics  of  decision 
theory  and  the  same  limitations  would  apply.  The  effectiveness  of  his  decisions 
would  be  limited  by  his  ability  to  evaluate  the  quality  of  information.  An 
officer's  decision-making  performance  could  be  improved  by  giving  him  more  in¬ 
formation  or  by  improving  his  ability  to  evaluate  information.  In  many  instances 
decisions  have  to  be  made  in  situations  where  there  is  no  possibility  of  getting 
additional  information.  In  these  situations,  the  best  we  can  do  for  the  officer 
is  to  make  sure  that  he  is  able  to  realistically  evaluate  the  information  to  get 
the  most  out  of  the  information  at  hand.  Thus,  teaching  officers  to  realistical¬ 
ly  evalw  te  the  quality  of  information  may  become  a  major  educational  and  be¬ 
havioral  objective  and  might  benefit  the  students'  performance  not  only  at 
Officer  Training  School  but  throughout  his  career  both  in  educational  and  opera¬ 
tional  settings. 

The  data  from  this  one  test  administration  clearly  indicate  that  there  are  wide 
individual  differences  among  the  officer  candidates  in  thei:  ability  to  evaluate 
the  quality  of  information.  The  possibility  exists  that  these  differences  may 
be  quite  temporary  because  of  different  understandings  of  the  test  instructions. 
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Experience,  however,  in  a  public  school  setting  where  students  are  tested  week¬ 
ly  or  more  often  indicates  that  these  individual  differences  do  remain  stable 
over  a  long  period  of  time.  Experience  also  indicates  that  certain  techniques 
can  be  successfully  employed  to  improve  the  realism  with  which  many  of  the  stu¬ 
dents  evaluate  information.  It  would  reem  worthwhile  to  investigate  the  pos¬ 
sibility  that  this  is  an  ability  characteristic  of  every  individual  and  further, 
that  it  is  an  ability  which  can  be  taught  and  improved  upon  with  practice. 


COMPARISON  OF  TOTAL  TEST  SCORES 

Remember  that  these  students  had  taken  a  previous  version  of  this  test  some 
weeks  prior  to  the  experimental  administration  of  Ll-2  and  L1-2A.  It  was  taken 
as  a  choice  test  as  part  of  the  normal  instructional  and  evaluation  program  at 
OTS.  The  records  were  retrieved  for  each  student  so  that  his  score  could  be 
compared  with  his  score  from  this  experimental  administration.  Figure  5  shows 
this  original  test  score  plotted  against  inferred  choice  score  for  each  of  the 
98  students  in  the  experimental  group.  Examination  of  Figure  5  shows  that  the 
scores  are  indeed  correlated  but  not  too  highly.  As  would  be  expected,  the 
average  test  score  was  much  higher  for  the  original  administration  of  the  test 
than  the  readministration  some  weeks  later. 

The  original  test  score  is  compared  with  the  Valid  Confidence  score  in  Figure  6. 
As  before,  there  is  a  correlation  but  not  too  high  a  one  between  these  two  sets 
of  test  scores.  The  positive  correlation  in  both  of  these  Figures  indicates 
that  the  original  test  and  the  experimental  test  are  measuring  some  things  in 
common.  This  would  certainly  be  expected  and  would  be  a  minimal  requirement 
for  any  new  testing  method. 

A  more  revealing  comparison  is  to  look  at  the  association  between  the  inferred 
choice  score  and  the  Valid  Confidence  score  for  the  experimental  administration. 
This  relation  is  not  obscured  by  retention,  selection  of  test  items,  etc.  Exam¬ 
ination  of  Figure  7  indicates  that  the  inferred  choice  score  and  Valid  Confidence 
score  are  indeed  related  with  a  correlation  higher  than  before  but  the  associa¬ 
tion  is  far  from  perfect.  The  Valid  Confidence  score  for  every  officer  candidate 
is  higher  than  his  inferred  choice  score.  Accepting  Massengill's  (1969)  argument 
that  the  Valid  Confidence  or  information  score  is  the  measure  of  the  amount  of 
information  demonstrated  by  the  student  with  respect  to  the  test  and,  thus,  the 
fair  way  to  assess  the  student,  it  becomes  apparent  that  choice  testing  under¬ 
estimates  the  amount  of  information  that  the  students  demonstrate.  In  addition, 
the  choice  test  makes  many  errors  in  ranking  the  students  according  to  their 
demonstrated  knowledge.  For  example,  the  student  making  the  highest  Valid  Con¬ 
fidence  score  (thus  indicating  the  possession  of  more  information  than  anyone 
else  in  the  class)  is  tied  with  four  other  people  for  a  rank  of  5.5  according 
to  the  choice  score.  For  another  example,  the  student  making  a  choice  score  of 
41  who  is  14th  in  rank  according  to  Valid  Confidence  score  finds  that  if  he  were 
taking  the  test  as  a  choice  test,  44  students  who  demonstrated  less  information 
than  he  did  would  have  been  given  higher  choice  scores  and  that  the  choice  test 
would  have  ranked  him  as  tied  with  12  other  students  who  in  fact  demonstrated 
less  information  than  he  did,  in  brief,  assessing  the  accomplishment  of  stu¬ 
dents  according  to  the  Valid  Confidence  score  can  make  quite  a  difference. 
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TEST  SCORE  AND  FUTURE  PERFORMANCE 


There  are  many  ways  to  understand  why  the  Valid  Confidence  score  is  more  valid 
in  assessing  students  than  is  the  choice  score.  Massengill  (1969)  relates  it 
to  the  measure  of  quantity  of  information  and  shows  many  concrete  examples  of 
how  the  information  score  serves  to  eliminate  the  operation  of  chance  in  guess¬ 
ing  and  how  it  rewards  the  student  who  is  uninformed  more  than  the  student  who 
is  misinformed.  Another  approach  is  to  consider  the  nature  of  the  test  itself 
and  the  relation  of  the  knowledge  demonstrated  on  the  test  to  the  performance 
of  the  student  in  situations  outside  of  the  test  administration.  If  the  test 
is  made  up  of  independent  bits  and  pieces  of  knowledge  and  the  performance  of 
related  tasks  outside  of  the  test  situation  is  such  that  performance  depends 
upon  how  many  of  these  bits  and  pieces  of  knowledge  are  mastered,  then  a  choice 
type  test  score  is  appropriate.  To  be  more  explicit,  if  a  "real  world"  task 
can  be  performed  just  as  well  if  the  student  has  bits  of  knowledge  represented 
by  test  items  1,  2,  and  3  as  if  he  had  bits  of  knowledge  represented  by  test 
items  1,2,  and  4  then  we  say  that  the  knowledge  tested  by  the  items  is  sub¬ 
stitutable  and  the  performance  of  the  person  depends  upon  how  many  bits  of  know¬ 
ledge  he  has  acquired.  The  more  bits  of  knowledge  he  has  acquired,  the  better 
he  is  able  to  perform  the  task.  This  is  the  type  of  situation  that  is  best 
assessed  by  a  choice  test  score. 

We  need  to  distinguish  however,  another  type  of  relation  between  the  knowledges 
assessed  by  a  test  and  performance  in  a  "real  world"  situation.  Many  "real 
world"  situations  seem  to  have  the  characteristic  that  in  order  to  perform  them 
at  all  successfully  you  need  to  have  certain  items  of  information  and  if  you 
have  not  mastered  all  these  items,  you  cannot  perform  the  task.  In  particular, 
if  you  are  misinformed  on  one  item  of  information,  then  you  are  guaranteed  to 
do  the  task  wrong.  The  items  of  information  cannot  be  substituted  one  for 
another.  Now  certainly  some  tasks  are  more  complex  than  others  and  they  require 
the  mastery  of  more  items  of  information  than  do  the  other  tasks.  For  an  ap¬ 
proximation  we  can  say  that  being  misinformed  on  one  item  of  information  so  damag¬ 
es  the  performance  of  the  individual  that  he  must  have  completely  mastered 
several  other  items  of  information  in  order  to  make  up  for  it  in  his  general 
behavior.  This  assumption  is  implied  in  the  Valid  Confidence  score  and  is  re¬ 
flected  in  the  fact  that  Valid  Confidence  testing  much  more  severely  penalizes 
the  student  who  has  complete  confidence  in  a  wrong  answer  and  denies  the  logical 
possibility  of  the  correct  answer  than  it  rewards  the  student  who  has  complete 
confidence  in  the  right  answer.  The  score  received  by  the  uninformed  student 
is  much  closer  to  full  credit  than  to  no  credit. 

In  summary,  if  the  information  structure  assessed  by  a  test  has  the  characteristic 
that  different  parts  of  it  must  be  put  together  for  the  successful  performance 
of  a  task  in  a  "real  world"  setting,  the  Valid  Confidence  score  is  clearly  more 
appropriate  than  a  choice  score.  If  the  information  structure  assessed  by  a 
test  is  composed  of  unrelated  bits  and  pieces  of  knowledge  then  a  modification 
of  the  Valid  Confidence  score  might  be  more  appropriate.  In  the  particular  case 
of  Ll-2  and  L1-2A,  and  in  fact  in  several  of  the  other  tests  administered  at 
the  Officer  Training  School,  these  are  really  test  batteries  composed  of  sub¬ 
tests,  each  subtest  measuring  a  particular  teaching  objective.  A  criterion 
score  is  placed  on  each  teaching  objective  so  that  the  student  must  make  a 
passing  score  on  each  subtest.  If  not,  the  student  must  study  again  for  the 
test  and  take  that  teaching  objective  over  again.  If  a  number  of  teaching  ob¬ 
jectives  are  failed,  the  student  fails  the  test  and  possibly  the  whole  course. 


COMPARISON  OF  SUBTEST  SCORES 

This  policy  makes  it  important  to  compare  Valid  Confidence  scores  and  inferred 
choice  scores  for  the  teaching  objectives  on  Ll-2  and  L1-2A.  This  has  been  done 
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for  the  first  two  teaching  objectives.  Figure  8  shows  the  association  between 
the  choice  and  confidence  scores  for  the  seven-item  subtest  Measuring  Mastery 
of  Teaching  Objective  No.  1,  while  Figure  9  shows  the  saae  for  Teaching  Objective 
No.  2.  These  data  yield  aaple  evidence  that  Valid  Confidence  scores  are  not 
the  sane  as  choice  scores  and  that  different  students  would  be  passed  or  failed 
under  the  two  grading  systems. 

For  the  highest  possible  choice  score  of  seven  correct  out  of  the  seven  items, 
the  Valid  Confidence  score  is  of  course  less  than  or  equal  to  the  choice  score 
of  seven.  Most  of  the  students  in  this  category  make  a  somewhat  smaller  Valid 
Confidence  score  than  seven.  This  is  so  because  these  students  indicate  that 
they  have  less  than  complete  confidence  in  the  correct  answer  on  one  or  more 
of  the  seven  test  items.  To  the  extent  that  they  have  less  than  coaqylete  con¬ 
fidence  in  the  correct  answers  this  score  must  fall  below  that  of  a  student  who 
has  complete  confidence  in  all  the  answers.  At  the  other  extreme  of  the  stu¬ 
dents  making  low  choice  scores  of  two  and  three  correct  out  of  the  seven  items, 
all  made  considerably  higher  Valid  Confidence  scores.  This  tends  to  be  so  be¬ 
cause  these  students  realized  that  they  did  not  know  the  answer  to  some  of  the 
questions  and  so  indicated  whereas  if  they  had  indicated  complete  confidence  in 
an  incorrect  answer,  their  choice  score  and  Valid  Confidence  score  would  have 
been  the  same.  Valid  Confidence  testing  is  rewarding  them  for  knowing  that 
they  don't  know. 

In  the  normal  use  of  this  test,  a  choice  score  of  five  or  more  correct  out 
of  the  seven  items  meets  the  teaching  objective.  Although  normally  the  students 
would  score  better,  it  is  still  interesting  to  look  at  how  many  and  which  stu¬ 
dents  would  fail  according  to  the  choice  score  and  which  students  would  fail  ac¬ 
cording  to  the  Valid  Conficmce  score.  In  Teaching  Objective  1,  23  students 
made  a  choice  score  of  less  than  five  and: thus  would  fail.  If  we  use  the  same 
criterion  score  of  five  for  the  Valid  Confidence  score,  we  find  that  19  students 
would  fail.  Six  of  these  19  students,  however,  would  pass  the  choice  test,  while 
the  Valid  Confidence  score  passes  10  students  who  fail  the  choice  test. 

In  Teaching  Objective  2,  the  choice  test  would  fail  24  of  the  98  students  while 
only  17  of  the  students  would  fail  the  Valid  Confidence  test.  Of  these  17,  two 
would  pass  the  choice  test  but  nine  who  pass  the  Valid  Confidence  test  would 
fail  the  choice  test. 

In  summary,  on  the  teaching  objective  subtests,  students  tend  to  make  higher 
Valid  Confidence  scores  than  choice  scores.  Although  choice  score  and  confidence 
score  are  related,  the  relation  is  such  that  different  students  are  passed  and 
failed  according  to  which  method  of  scoring  is  used.  These  different  instruc¬ 
tional  decisions  are  not  due  to  changes  from  test  to  retest  or  other  sources  of 
variability  inherent  in  the  methods  of  scoring.  From  many  points  of  view, 
the  Valid  Confidence  score  can  be  shown  to  be  the  fair  way  of  assessing  knowledge. 
If  it  is  accepted  that  the  information  score  is  fair,  the  choice  score  cannot  be. 
And  to  the  extent  that  the  choice  score  is  passing  and  failing  different  students, 
choice  testing  is  making  serious  errors  in  the  assessment  of  students. 


DIAGNOSIS  OF  STUDENT  ACHIEVEMENT 

Valid  Confidence  testing  can  be  used  to  obtain  a  detailed  diagnosis  of  each 
student's  strengths  and  weaknesses  with  respect  to  the  subject  matter  of  the 
test.  This  was  done  by  finding  the  student's  state  of  knowledge  based  on 
his  allocation  of  confidence  among  the  possible  answers  and  the  results  are 
shown  in  Table  1.  The  use  of  choice  testing  for  this  purpose  would  result 
in  numerous  errors  as  shown  in  Row  IV  of  this  table.  Row  V  indicates  the 
extent  to  which  guessing  was  eliminated  through  the  use  of  Valid  Confidence 
testing. 
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A  multiple- choice  test  os  leadership  ms  idnanistered  to  9S  officer  cmulidafes 
is  residence  at  the  Officer  Training  School.  Lackland  Air  Force  lax.  This 
test  was  administered  using  the  naterials  aad  nethods  of  Valid  Ccnfidwe 
testing.  Less  than  one  hour  ms  devoted  to  instructing  the  officer  rnadidifn 
on  how  to  take  a  Valid  Confidence  test,  aad  the  noraal  tine  ms  then  allowed 
for  the  students  to  respond  to  the  St  test  iteas. 


Analysis  of  the  data  indicates  that  taking  a  Valid  Confidence  test  requires  no 
nore  tine  than  normally  allotted  to  test  administration.  All  the  officer  candi¬ 
dates  understood  the  instructions  and  gave  confidence  responses  tdudb  yielded 
note  information  than  choice  responses  would  have.  Hide  individual  differences 
were  observed  in  the  officer  candidates*  ability  to  realistically  evaluate  the 
quality  of  information.  Since  an  officer's  ability  to  noke  effective  decisions 
can  be  limited  by  the  realise  with  vhich  he  is  able  to  evaluate  the  quality  of 
information,  it  appears  worthwhile  to  conduct  further  research  to  measure  this 
ability  and  to  develop  techniques  to  inprove  this  ability  in  the  officer  candi¬ 
date  student  population. 

Test  scores  yielded  by  Valid  Confidence  testing  are  related  to  but  are  not 
the  sane  as  test  scores  obtained  from  choice  testing.  Several  lines  of  reason¬ 
ing  lead  to  the  conclusion  that  the  Valid  Confidence  or  information  score  provides 
a  fairer  basis  for  assessment  than  does  the  choice  score.  Therefore,  to  the 
extent  that  choice  testing  passes  and  fails  different  students  than  does  Valid 
Confidence  testing,  use  of  choice  testing  as  the  neans  of  assessing  and  grading 
students  lends  to  tatfair  grades  and  incorrect  instructional  decisions,  further 
use  of  Valid  Confidence  testing  as  part  of  the  normal  program  of  instruction 
and  evaluation  at  the  Officer  Training  School  should  give  students,  instructors, 
and  those  responsible  for  evaluation  further  insights  into  the  asses saent  pro¬ 
perties  of  the  information  score. 
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Figure  2.  A  measure  of  the  fundamental  validity  and  existence  of  confidence. 
A  data  point  falling  in  the  region  to  the  left  and  above  diagonal  line  indi¬ 
cates  that  student's  confidence  responses  are  yielding  more  information  than 
would  choice  responses.  Test  data  for  the  two  groups  of  students  whose  data 
points  are  circled  are  analyzed  further  in  Figures  3  and  A. 
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Figure  3-  Externa!  validity  graph  based  on  the  five  students 
indicated  in  Figure  2,  whose  confidence  responses  yielded  min 
imal  gain  in  information  over  choice  testing. 
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Figure  h.  External  validity  graph  based  on  the  five  students 
indicated  in  Figure  2,  whose  confiderce  responses  yielded  max¬ 
imal  gain  in  information  over  choice  testing. 
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Figure  5-  Association  between  Original  Test  Score,  obtained  from 
the  earlier  administration  of  a  previous  version  of  Li-2  or  L1-2A 
as  part  of  the  routine  testing  program  of  the  Officer  Training 
School,  and  Inferred  Choice  Score  obtained  from  the  experimental 
administration  of  LI-2  or  LI-2A  as  a  Valid  Confidence  test. 
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Figure  6.  Association  between  Original  Test  Score,  obtained  from 
the  earlier  administration  of  a  previous  version  of  Ll-2  or  L1-2A 
as  part  of  the  routine  testing  program  of  the  Officer  Training 
School,  and  Valid  Confidence  Score  obtained  from  the  experimental 
administration  of  Ll-2  or  L1-2A. 


Figura  7.  Association  between  the  score  the  student  would  have 
made  if  Ll-2  or  LI-2A  had  been  administered  as  a  choice  test,  and 
the  Valid  Confidence  score  that  the  student  actually  received  from 
the  experimental  administration. 
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Figure  8.  Association  between  choice  and  confidence  scores  for 
seven  item  subtest  measuring  mastery  of  Teaching  Objective  No.  1 
on  Li-2  or  L1-2A. 
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Figure  9.  Association  between  choice  and  confidence  scores  for 
seven  item  subtest  measuring  mastery  of  Teaching  Objective  No.  2 
on  Li-2  or  U-2A. 
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Table  2.  Selected  Statistics  for  liach  Student 
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Table  3.  Selected  Statistics  for  liaeh  Student 
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Table  4 .  Selected  Statistics  for  Each  Student 
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Table  5 .  Selected  Statistics  for  Each  Student 
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Table  g .  Selected  Statistics  for  Each  Student 
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Table  9 .  Selected  Statistics  for  Each  Student 
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Air  Force  Office  of  Scientific  Research 
TECH,  OTHER  1400  Wilson  Boulevard  (SRLB) 

_ Arlington,  Virginia  22209 _ 

A  multiple- choice  test  on  leadership  was  administered  to  98  officer  candidates 
in  residence  at  the  Officer  Training  School,  Lackland  Air  Force  Base.  Less  than  one 
hour  was  devoted  to  instructing  the  officer  candidates  on  how  to  take  a  Valid  Con¬ 
fidence  test,  and  the  normal  time  was  then  allowed  for  the  students  to  respond  to  the 
58  test  items. 

Analysis  of  the  data  indicates  that  taking  a  Valid  Confidence  test  requires  no 
more  time  than  normally  allotted  to  test  administration.  All  the  officer  candidates 
understood  the  instructions  and  gave  confidence  responses  which  yielded  more  informa¬ 
tion  than  choice  responses  would  have.  Wide  individual  differences  were  observed  in 
the  officer  candidates'  ability  to  realistically  evaluate  the  quality  of  information. 

Test  scores  yielded  by  Valid  Confidence  testing  are  related  to  but  are  not  tne 
same  as  test  scores  obtained  from  choice  testing.  Several  lines  of  reasoning  lead 
to  the  conclusion  that  the  Valid  Confidence  or  information  score  provides  a  fairer 
basis  for  assessment  than  does  the  choice  score.  Therefore,  to  the  extent  that 
choice  testing  passes  and  fails  different  students  than  does  Valid  Confidence  testing 
use  of  choice  testing  as  the  means  of  assessing  and  grading  students  leads  to  unfair 
grades  and  incorrect  instructional  decisions. 
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